Skip to content

Conversation

ciarams87
Copy link
Contributor

@ciarams87 ciarams87 commented Sep 30, 2025

What this PR does / why we need it:

The mesh weight conformance tests were executing 500 separate kubectl exec commands with random delays, resulting in very slow test execution.

This PR implements an optimization using the echo client's --count flag to execute all 500 requests in a single batch, reducing test time.

Testing:
Ran the mesh tests on istio in a GKE cluster

Performance improvement varies by environment - my tests showed the time went from 60-70s -> 3-4s per test (~95% faster)

Current main:

--- PASS: TestConformance (201.69s)
    --- PASS: TestConformance/MeshGRPCRouteWeight (64.10s)
        --- PASS: TestConformance/MeshGRPCRouteWeight/Requests_should_have_a_distribution_that_matches_the_weight (63.72s)
    --- PASS: TestConformance/MeshHTTPRouteWeight (66.91s)
        --- PASS: TestConformance/MeshHTTPRouteWeight/Requests_should_have_a_distribution_that_matches_the_weight (66.54s)
PASS

With the batch requests:

--- PASS: TestConformance (91.34s)
    --- PASS: TestConformance/MeshGRPCRouteWeight (3.51s)
        --- PASS: TestConformance/MeshGRPCRouteWeight/Requests_should_have_a_distribution_that_matches_the_weight (3.08s)
    --- PASS: TestConformance/MeshHTTPRouteWeight (3.92s)
        --- PASS: TestConformance/MeshHTTPRouteWeight/Requests_should_have_a_distribution_that_matches_the_weight (3.45s)
PASS

Which issue(s) this PR fixes:

Fixes #4101

Does this PR introduce a user-facing change?:

NONE

The mesh weight conformance tests were executing 500 separate kubectl exec
commands with random delays, resulting in very slow test execution. This
optimization uses the echo client's --count flag to execute all 500 requests
in a single batch, dramatically reducing test time.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ciarams87
Once this PR has been reviewed and has the lgtm label, please assign liorlieberman for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 30, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @ciarams87. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rikatz
Copy link
Member

rikatz commented Oct 1, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 1, 2025
responses := parseMultipleResponses(resp.RawContent)

if len(responses) != count {
tlog.Logf(t, "Warning: expected %d responses but got %d", count, len(responses))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify why this is not an error? if you send 500 requests and gets 499 back, shouldn't this be a problem?

Copy link
Member

@rikatz rikatz Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is expected, as I can see you have a "tolerance" configuration below, maybe the tolerance should be part of the MeshPod instance (so each test have a different tolerance) or please add a comment here raising that the caller of RequestBatch expects a tolerance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you! Changed it to return an error. I made it a warning initially due to uncertainty about parsing edge cases, but you're right that missing responses should fail the test immediately. The tolerance is only for validating the weight distribution (±5%), not for handling missing responses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I will come back to this review soon this week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v1.4.0: new conformance tests are too slow
3 participants